👁 Vision Language Model - ali.mouizina · Scour

VLM-UQBench: A Benchmark for Modality-Specific and Cross-Modality Uncertainties in Vision Language Models

arxiv.org·1d

PaddleOCR-VL-1.5: A 0.9B Vision-Language OCR Model Built for Real-World Documents

hackernoon.com·1d

🎯Object Detection

Language Model Inversion through End-to-End Differentiation

arxiv.org·14h

UbiquitousLearning/mllm: Fast Multimodal LLM on Mobile Devices

github.com·10h

Show HN: A segmentation model client-side via WASM

qtoolkit.dev·6h·

Discuss: Hacker News

🎯Object Detection

Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze

nature.com·16h

👁Computer vision

New Ovis2.6-30B-A3B, a lil better than Qwen3-VL-30B-A3B

huggingface.co·7h·

Discuss: r/LocalLLaMA

Gibbs Measures from Deep Shaped Multilayer Perceptrons

link.aps.org·6h

Large Language Models for Mortals book

andrewpwheeler.com·1d

The “Think in Pictures” Upgrade for Multimodal Models

hackernoon.com·16h

🎯Object Detection

Building a Production-Grade Autonomous LLM Agent with Tool Use, Memory, and Multimodal Capabilities

pub.towardsai.net

·1d

How Transformer Architecture Powers LLMs

dev.to·6h·

Discuss: DEV

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

huggingface.co·1h·

Discuss: r/LocalLLaMA

A History of Large Language Models

gregorygundersen.com·18h

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

chipublib.idm.oclc.org·1d

The feature space for drifting models

breno.bearblog.dev·1d

👁Computer vision

Karpathy's Micro LLM in JavaScript

github.com·3h·

Discuss: Hacker News

EyesOff: Why Some Models Quantize Better Than Others

ym2132.github.io·20h·

Discuss: Hacker News

AI Image Editors

trendhunter.com·1d

👁Computer vision

A C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model

blog.adafruit.com·2h

Loading more...